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(54) Method and apparatus for detecting a fault condition in a computer processor 

(57) A method for detecting a fault condition in a 
computer processor operating a main control program, 
comprising the steps of sequentially performing a plural- 
ity of functions (Fi, F2, F3, F4, F5) on an initial input 
value so as to compute a final value (G), the input value 
to each of the second and subsequent functions being 
provided by the output value from the preceding func- 
tion in the sequence; loading at least one self-test mod- 
ule (X, Y) onto the computer processor for detecting 
whether a fault condition has occun-ed in the computer 
processor, wherein at least one of the functions is car- 
ried out within a self-test module (X, Y); and comparing 
the computed final value (G) with a predetemiined value 
to provide an indication of whether a fault condition has 
occurred in the computer processor. The invention also 
relates to an apparatus for detecting a fault condition in 
a computer processor. 




key number (6) 



FI6 



Icey number (6) 



Printed by Xerox (UIQ Business Services 
2.16.7 (HRS)/3.6 



EP 1 063 591 A2 
Description 

[0001] The invention relates to a method and apparatus for detecting a fault condition in a computer processor. 
[0002] In computer operated systems, it is desirable to be able to detect when a fault or malfunction has occurred 

5 in the computer processor. In particular, the detection of a fault is vitally important in safety-critical computer processor 
applications, such as in aircraft computer systenns. A known method for detecting a fault or malfunction in a computer 
processor utilises a timer counter, commonly referred to as a "watchdog timer*. The timer counter receives a clocked 
input pulse of predetemnined frequency and the count of the timer counter is incremented each time a pulse of the 
clocked input is applied. In the event that the count reaches a pre-set maximum count, the timer counter generates an 

70 output pulse. 

[0003] The computer processor is programmed with a self-test module whrch checks whether the computer proc- 
essor is perfomning correctly. Periodically, a signal derived from the self-test module Is supplied by the processor to the 
reset input on the timer counter to reset the counter. Providing the computer processor is functioning correctly, the timer 
counter does not therefore reach the pre-set maximum count and does not provide an output. If a fault occurs in the 
IS computer processor, the reset signal is not provided to the timer counter and, on reaching the predetennined count, the 
timer counter generates an output pulse, the generation of the output pulse thus signifying that a fault has occurred in 
the computer processor. 

[0004] A disadvantage of this fault detection method is that when a fault occurs in the computer processor the sig- 
nal provided by the processor to the timer counter may become "stuck" so that the reset signal is continuously supplied 
20 to the timer counter. Thus, even though a fault may have occurred in the computer processor, an output will not be pro- 
vided by the timer counter to indicate that there is a fault. 

[0005] A more sophistk:ated type of watchdog timer is described in US 5 073 853. Using this method, the computer 
runs a self-test module and the signal supplied by the self>test module alternates between two values. Each value is 
derived from the preceding value by a calculation perfonned by the computer processor The alternating signal is sup- 

25 plied to an input of a comparator which provides a reset signal to the watchdog timer only if the correct sequence of 
values is received at the comparator Input. Using this method the connect sequence of reset signals cannot be produced 
if the computer processor has failed. In addition, the watchdog timer described in US 5 073 853 includes a "window 
timer", arranged such that the watchdog timer responds to the reset signal only if the signal is received within a prede- 
termined time window. Any signals received outside the predetermined time window are regarded as faults and a fault 

30 output is generated. 

[0006] Another known method for detecting a computer fault is described in US 5 257 373 In whteh a control pro- 
gram is loaded onto the processor and performs a number of separate functions on an input value. After each function 
of the control program has been completed, a software check is made to determine whether the function was executed 
correctly and, if so, a counter associated with that function is incremented accordingly. At the end of the sequence, the 
35 count in each counter is checked in software and only in the event that all the counters have incremented con-ectiy will 
a reset signal be provided to a watchdog timer. 

[0007] A disadvantage of this method is that the final checking step In the procedure (i.e. the checking of the coun- 
ter contents) is performed in software and thus is itself vulnerable to computer failure. In addition, the counter contents 
are not cleared during computer processing so that the control program may become "stuck", thereby causing an erro- 
40 neous reset signal to be provided to the timer counter even in the event of a fault. 

[0008] It Is an object of the present invention to provide a method for detecting a fault condition in a computer proc- 
essor which has an improved fault detection capability. 

[0009] According to the present invention tiiere is provided a method for detecting a fault condition in a computer 
processor, comprising the steps of: 

45 

sequentially perfonrning a plurality of functions on an Initial input value so as to compute a final value, the input 
value to each of the second and subsequent functions being provided by the output value from the preceding func- 
tion in the sequence; 

so loading at least one self-test module onto the computer processor for detecting whether a fault condition has 
occurred in the computer processor, wherein at least one of the functions is carried out within a self-test module; 
and 

comparing the computed final value with a predetermined value to provide an indication of whether a fault condition 
55 has occurred in the computer processor. 

[001 0] Each of the functions must be performed, and in the correct sequence, for a correspondence to be obtained. 
Thus, the method has an improved fault condition detection capability. By distributing the functions throughout the con- 
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trol program the method can be used to check whether the various steps of the program are being performed in their 
con^ct sequence. Furthermore, by performing at least one of the functions within a self-test module, a check is made 
on the functioning of the self-test module itself. 

[0011] The computed final value may be made up of two secondary computed values, a correspondence being 
5 obtained when the secondary computed values are generated in a required sequence. 

[0012] Conveniently, the self-test modules are provided within the main control program operated by the computer. 
[001 3] The method preferably includes the further steps of: 

generating a service pulse if the computed final value is equivalent to the predetermined value; 

10 

generating a time window; 

detecting whether the service pulse is received within the time window; and 
IS generating a fault condition output if the service pulse is received outside of the time window. 

[0014] Thus, a fault can be detected even if the computed final value becomes "stuck" at the corect value, as the 

subsequent service pulse must be received within the time window for a valid service to be registered. If the service 
pulse is received before the time window has been started, or after expiry thereof, a fault condition output is generated 
20 to Indicate that a fault has occurred in the computer processor. 

[001 5] Alternatively, the method may include the further steps of: 

incrementing a count of counter means, the counter means providing a fault condition output in the event that a pre- 
set count is reached and; 

25 

changing the count of the counter means in response to a correspondence between the computed final value and 
the predetermined value, such that, in the event that no such correspondence occurs, the counter means provides 

a fault condition output, thereby indicating that a fault condition has occun-ed in the computer processor. The count 
is preferably reset to a zero count in response to a correspondence between the computed final value and the pre- 
30 determined value. 

[001 6] According to another aspect of the invention, there is provided an apparatus for detecting a fault condition in 
a computer processor comprising: 

35 means for sequentially performing a plurality of functions on an input value so as to compute a final value, the input 
value to each of the second and subsequent functions being provided by the output value from the preceding func- 
tion in the sequence; and 

at least one self-test module, loaded onto the computer processor, for detecting whether a fault condition has 
40 occurred in the computer processor, wherein at least one of the functions is carried out within a self-test module, 
and 

means for comparing the computed final value with a predetermined value to provide an indication of whether a 
fault condition has occun'ed in the computer processor. 

45 

[001 7] The apparatus preferably includes means for generating a service pulse if the computed final value is equiv- 
alent to the predetermined value, means for generating a time window, and means for detecting whether the service 
pulse Is received within the time window, whereby receipt of the service outside the time window results in generation 

of a fault condition output 

50 [0018] For the purpose of this specification, the occurrence of a fault or functional error in a computer processor 
shall be referred to as a "fault condition". 
[001 9] In the accompanying drawings: 

Figure 1 is a schematic diagram of a conventional watchdog timer for use in a method of detecting a fault condition 
55 in a computer processor; and 

Rgure 2 is a flow diagram to illustrate the method of the present invention. 
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[0020] With reference to Rgure 1 , a computer processor (not shown) is programmed with software for performing 
a particular operation such as, for example, controlling hardware. Conventionally, in order to check whether the compu- 
ter processor is operating correctly, the computer periodically runs a setf*test module which provides an output signal 
20 each time the self-test module is executed correctiy. The output signal 20 is supplied to a watchdog timer to deter- 
5 mine whether the computer processor is operating con-ectly, as will now be described. 

[0021] In the watchdog timer, the output signal 20 is supplied to a comparator 25 where it is compared with a pre- 
detemriined key number 30 input to the comparator 25. If the output signal 20 is equal to the key number 30, the com- 
parator 25 generates a first output pulse 26, referred to as a service pulse, and if the output signal 20 is not equal to the 
key number 30, the comparator 25 generates a second output pulse 28. If a second output pulse 28 is generated, this 

10 is supplied to an OR gate 70 which outputs a tautt detection pulse 80 to indicate that a fault has occurred. 

[0022] The watchdog timer also includes a counter 32, refenred to as a delay counter, which receives a clocked 
input pulse 35 from a clock oscillator 40 operatirig at a predetemnined frequency. The clocked input pulse 35 increments 
the count stored in the delay counter 32 each time an input pulse is received. After a predetermined count has been 
reached, the delay counter 32 provides an output signal 45 to trigger a window generator 50 which, on receipt of the 

15 signal 45, generates a time window 55 which is supplied to a window comparator 60. 

[0023] The window comparator 60 also receives the service pulse 26 If the comect output signal 20 has been sup- 
plied by the self-test module. If a subsequent service pulse 26 an'ives at the window comparator 60 outside the time 
window 55, generated by the window generator 50 on receipt of the preceding service pulse, the window comparator 
60 outputs a first output pulse 62 to the OR gate 70. In addition, rf the time window 55 generated by the window gener- 

20 ator 50 expires before the window generator 60 has received the subsequent service pulse 26 from the comparator 25, 
the window comparator 60 generates a second output pulse 64 which is also provided to the OR gate 70. If the OR gate 
70 receives a second output pulse 28 from the comparator to indicate that an inconrect output signal 20 has been 
received, or an output pulse 62 from the window generator 60 to indicate that the subsequent service pulse was not 
received within the time window 55, or an output pulse 64 from the window generator 60 to Indicate that the time window 

25 55 expired before the subsequent service pulse was received, a fault output signal 80 is generated to indicate that a 
fault has occurred in the computer processor. 

[0024] The output pulse 26 generated by the comparator 25 in response to a "connect" output signal 20 is also sup- 
plied to the delay counter 32 and the window generator 50 to reset these counters ready for the next output signal 20. 
[0025] Referring to Figure 2, there is shown a flow diagram to illustrate the method of the present invention which 
30 has an improved fault detection capability. The method may be implemented using the watchdog timer 100 of the type 
generally described with reference to Rgure 1 . 

[0026] A computer processor (not shown) runs a control program for performing a desired operation, such as con- 
trolling hardware on an aircraft. The processor is also programmed with two self-test modules, referred to as (X) and 
(Y), to periodically check the operation of the processor. The self-test modules are distributed at suitable places 
35 throughout the control program so that, as the processor performs its usual processing steps, it periodically encounters 
the self-test modules. 

[0027] In addition, five routines are distributed throughout the control program to perfbmi functions F^, Fg, F3, F4 
and F5. The first routine receives an input value A and performs a first function F.|. thereby generating a value B. Value 
B becomes the input value for the second routine which perfomns function F2 and generates an output value 0 which 
40 forms the input to a third routine which performs function F3 generating a value D. In addition, self-test module X 
includes a sub-routine to perform function F4 on value D, the output of which (value E) is input to a function F5 within 
self-test module Y. The final output value, G, generated from function F5, is therefore derived by sequentially perfonning 
a number of functions on the original input value A, or "seed number*. The output value, G, is referred to as the "key 
number" and can be expressed mathematically by the following equation: 

45 

key numbers F5 (F^ (F3 (Fg (F^ (seed number))))) 

[0028] The key number, G, computed In software Is then supplied to the watchdog timer 1 00, where It is compared 
with a predetermined value i.e. the key number, G, takes the place of the input signal 20 shown in Figure 1. As 
50 described previously, the watchdog timer 1 00 includes a delay counter, driven by a clocked input pulse 1 35, the watch- 
dog timer generating a fault output signal 180 in the event of an incon'ect key number being calculated, or in the event 
that the window comparator is serviced with a service pulse outside the time window, or in the event that the time win- 
dow expires before a service pulse is received. 

[0029] With reference to Figure 2, only if all of the functions F-i — F5 have been executed in sequence, and in the 
55 con-ect sequence, will the computed key number be equal to the predetemnined key number required by the watchdog 
timer 100 for a valid service. Thus, if any one of the functions F^ — F5 is not implemented, or if the functions are not 
implemented in the conrect order, the key number computed in software will not match the key number required for a 
valid service, and a fault condition output signal 1 80 will be generated to provide an indication that a fault has occurred 
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in the computer processor. 

[0030] Furthermore, as the window comparator generates an output pulse in the event of either a service pulse 
being received after expiry of the time window, or in the event that the time window expires before a service pulse is 
received, it is possible to identify fault conditions arising from both premature and delayed service pulses. 

5 [0031] Each time the key number is written to the watchdog timer 1 00, the Icey number Is immediately ovenA/ritten 
in software and a new key number is computed for the next servicing of the counter. The immediate overwriting of the 
key number in this way does provide the remote possibility that a fault which occurs due to the overwrite step being 
skipped can go undetected. This problem can be overcome in several ways. For example, the method can be adapted 
such that two key numbers must be written consecutively and in the correct order in order to achieve a valid service. 

10 The same memory location is used for both key number write operations, such that writing the second number to the 
memory location ensures the first key number no longer resides in that location. 

[0032] Alternatively, the problem of skipped overwrite can be overcome by arranging for the key number required 
for a valid service to alternate cyclically between two different values. Thus, the predetemnined input 30 to the compa- 
rator 25 must be varied cyclically between two different values. In this case It is necessary to compute two key numbers 

15 alternately throughout the control program, each being calculated by a series of sequential functions, as described 
hereinbefore for a single key number, and with both key numbers being written to one memory location. Once again, 
the writing of the second key number ensures that the first number no longer resides in the memory location. IHowever, 
this latter solution has an increased complexity over the former solution, as it is necessary to keep synchronisation 
between the key numbers calculated by the software and the key numbers required by the counter. 

20 [0033] The software code required to perfonn the functions F^ — F5 may be distributed throughout the control pro- 
gram on a time distribution basis so that the counter is serviced with the cateulated key number at approximately regular 
time intervals. Alternatively, the software code required to perform the functions F^ — F5 may be located in the control 
program in dose proximity with the most critical software modules of the main control program. In a further alternative 
embodiment the software code for performing the functions may fomi an integral part of the most critical software mod- 

25 ules. 

[0034] In the example illustrated in Rgure 2, five functions, F^ — F5. are utilised in the sequence for computing the 
key number, two of which are implemented within the self-test modules X and Y. However, It will be appreciated that any 
number of functions, F^ — Fp, and any number of setf-test modules may be employed. By utilising one or more self-test 
module, the method provides the advantage of a two-fold fault condition check, the or each self-test module itself pro- 
30 viding a means of detecting whether a fault condition has occurred in addition to the key number calculation. It will also 
be appreciated that increasing the number of function stages in the key number calculation increases the reliability of 
the method in detecting the occurrence of a fault in the processor. 

[0035] Although it is not necessary for the setf-test modules to be distributed throughout the control program loaded 
onto the processor, it is important that the self-test modules are intimately involved in a part of the calculation of the key 
35 number by including at least one of the functions, F^ — F^, used to calculate the key number within a self-test module. 
In this way, a check is made on the functioning of the self-test module itself. 

[0036] It will be appreciated that the comparator 25 may be replaced with any suitable comparative circuitry for 
comparing the key number, G, received at one input, with a predetermined value. The predetermined value need not 
be the expected value of the key number, but may be any predetermined value, the difference between the key number, 

40 G, and the predetemnined value being used to detemnined whether service pulse 26 or output pulse 28 is generated. 
Additionally, the OR gate 70 shown in Figure 1 may be replaced by logic circuitry which is implemented in software. 
[0037] It will be appreciated that the method of the present invention may be implemented in combination with a 
watchdog timer of a type other than that described with reference to Figure 1 . For example, a more simplified watchdog 
timer may be employed in which a counter, receiving a continuous clocked input pulse, provides a fault condition output 

45 in the event that a pre-detemiined maximum count is reached. If a correct key number is received at the comparator, 
an output signal from the comparator Is supplied to the counter to reset the count Thus, In the event that an incorrect 
key number is calculated, no reset signal is received by the counter, the predetermined maximum count is reached and 
a fault condition output is generated. 

50 Claims 

1 . A method for detecting a fault condition in a computer processor operating a main control program, comprising the 
steps of: 

55 sequentially performing a plurality of functions (F^, F2, F3, F4, F5) on an initial input value so as to compute a 

final value (Q), the Input value to each of the second and subsequent functions being provided by the output 
value from the preceding function in the sequence; 
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loading at least one self-test module (X, Y) onto the computer processor for detecting whether a fault condition 
has occurred in the computer processor, wherein at least one of the functions is carried out within a self-test 
module (X, Y); and 

5 comparing the computed final value (G) with a predetermined value to provide an Indication of whether a fault 

condition has occun'ed in the computer processor. 

2. The method as claimed in Claim 1, wherein the computed final value (G) comprises two secondary computed val- 
ues, a correspondence being obtained when the secondary computed values are generated in a required 

10 sequence. 

3. The method as claimed in Claim 1 or Claim 2, wherein the self-test modules (X, Y) are provided within the main 
control program operated by the computer. 

15 4. The method as claimed in any of Claims 1 to 3, the method comprising the step of performing the plurality of func- 
tions (Fi, ^^3' ^4> ^s) in software using code distributed throughout the main control program on a time distribu- 
tion basis such that the computed final value (G) is calculated at substantially regular intervals. 

5. The method as claimed in any of Claims 1 to 3, the method comprising the step of performing the plurality of func- 
20 tions (F^, F2, F3, F4, F5) in software using code distributed throughout the main control program in close proximity 

with a critical software module. 

6. The method as claimed in any of Claims 1 to 3, the method comprising the step of performing the plurality of func- 
tions (F^, F2, F3, F4, F5) in software using code fonning an Integral part of a critical software module. 

25 

7. The method as claimed in any of Claims 1 to 6, further comprising the steps of: 

generating a service pulse (26) if the computed final value (G) is equivalent to the predetemnined value; 
30 generating a time window; 

detecting whether the service pulse (26) is received within the time window; and 

genemting a fault condition output (80; 1 80) if the service pulse is received outside of the time window. 

35 

8. The method as claimed in Claim 7, further comprising the step of generating a fault condition output (80; 1 80) in 
the event that the service pulse is received before the time window has been started, or after the time window has 
expired. 

40 9. The method as claimed in Claim 7, further comprising the step of 

Incrementing a count of counter means, the counter means providing a fault condition output in the event that 
a pre-set count is reached and; 

45 changing the count of the counter means in response to a correspondence between the computed final value 

and the predetermined value, such that, in the event that no such correspondence occurs, the counter means 
provides a fault condition output, thereby indicating that a fault condition has occun^ed in the computer proces- 
sor. 

50 10. The method as claimed in Claim 9, wherein the counter means is reset to a zero count in response to a correspond- 
ence between the computed final value and the predetermined value. 

11. An apparatus for detecting a fault condition in a computer processor operating a main control program comprising: 

55 meansforsequentially performing a plurality of functions (F^, F2, F3, F4, F5)on an input value so as to compute 

a final value (G), the input value to each of the second and subsequent functions being provided by the output 
value from the preceding function in the sequence; and 
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at least one self-test module (X, Y). loaded onto the computer processor, for detecting whether a fault condition 
has occurred in the computer processor, wherein at least one of the functions is earned out within a self-test 
module, and 

means (1 00) for comparing the computed final value (G) with a predetermined value to provide an indication of 
whether a feult condition has occurred in the computer processor. 

12. The apparatus as claimed In Claim 11, further comprising; 

means (25) for generating a service pulse (26) if the computed final value (G) is equivalent to the predeter- 
mined value, 

means (50) for generating a time window (55), and 

means (60) for detecting whether the service pulse (26) is received within the time window (55), whereby 
receipt of the service pulse (26) outside the time window (55) results In generation of a fault condition output 

(80). 

13. The apparatus as claimed in Claim 12, comprising a comparator (25) for generating the service pulse (26). 



EP 1 063 591 A2 



r 




I 



8 



EP 1 063 591 A2 



1 






self 
test 

(X) 




self 
test 

(Y) 



180 



key number (6) 



FI6 2 



key number ( G) 



